[zipsync] Add new tool to efficiently pack and unpack cache entries#5361
Merged
[zipsync] Add new tool to efficiently pack and unpack cache entries#5361
Conversation
620afe6 to
3e3a75d
Compare
dmichon-msft
approved these changes
Sep 16, 2025
2636c4a to
de527a2
Compare
6d1bc86 to
744277a
Compare
Member
Author
|
I've removed the integration into the build cache. We would need to re-design some things to use a worker pool. Using zipsync without a worker pool will end up being slower than tar+gzip. This is because of the overhead of booting up a worker and the node require calls. |
744277a to
e169cba
Compare
Contributor
dmichon-msft
left a comment
There was a problem hiding this comment.
Would love to see more JSDoc and code comments around the binary-heavy parts, especially.
dmichon-msft
approved these changes
Sep 27, 2025
Contributor
dmichon-msft
left a comment
There was a problem hiding this comment.
Couple minor notes left.
d266ff0 to
f7164c6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
zipsync is a tool to pack and unpack zip archives. It is designed as a single-purpose tool to pack and unpack build cache entries.
Details
Unpack
Pack
Supported compression types are store (no compression), deflate (level 9), auto (switches between store/deflate based on file extension).
Constraints
Though archives created by zipsync can be used by other zip compatible programs, the opposite is not the case. zipsync only implements a subset of zip features to achieve greater performance.
What's wrong with the current setup?
The current setup cleans target directories when unpacking; then the build cache entry is unpacked. This setup ends up deleting and rewriting a lot of the same files.
Pros
With tar + gzip files are archived first and compressed second. This allows the compression to work across file boundaries. Duplicate content across files can be efficiently compressed.
Cons
Since compression is the last step, uncompressing the archive is required to inspect it. To enumerate the archive, it must be uncompressed first.
It does not clean the target directory so a
rm -rfstep is required.Requirements
zipsync was created with the following constraints in mind
Optimize for partial unpack scenario
Optimize for unpack performance. Most of the build cached files already exist on disk and there is a good chance for them to be already in the expected state.
Only write files when needed
This will minimize the number of write syscalls. Also, if the kernel has already cached the file from a recent read, the cache remains intact if we don't needlessly delete and rewrite the file.
Clean extra files and directories
This will remove the need to run
rm -rfon the target directories. More time savedDisallow symlinks
Symlinks in build cache entries are not supported. This will remove the need to scan the target directories for symlinks before running tar.
Why zip
zip was picked because:
How it was tested
node apps/rush/lib/start-dev.js --debug build --verbose -t module-minifierBenchmark Results
This document contains performance measurements for packing and unpacking a synthetic dataset using tar, zip, and zipsync.
The dataset consists of two directory trees (subdir1, subdir2) populated with 1000 text files each.
zipsync scenarios
zip and tar scenarios clean the unpack directory before unpacking. This time is included in the measurements because
zipsync internally handles cleaning as part of its operation.
System
Iterations: 100
Compressed (baseline: tar-gz)
Unpack Phase
Pack Phase
Uncompressed (baseline: tar)
Unpack Phase
Pack Phase